A Korean Homonym Disambiguation System Based on Statistical Model Using Weights
نویسندگان
چکیده
A homonym could be disambiguated by another words in the context as nouns, predicates used with the homonym. This paper using semantic information (co-occurrence data) obtained from definitions of part of speech (POS) tagged UMRD-S 1 ). In this research, we have analyzed the result of an experiment on a homonym disambiguation system based on statistical model, to which Bayes' theorem is applied, and suggested a model established of the weight of sense rate and the weight of distance to the adjacent words to improve the accuracy. The result of applying the homonym disambiguation system using semantic information to disambiguating homonyms appearing on the dictionary definition sentences showed average accuracy of 98.32% with regard to the most frequent 200 homonyms. We selected 49 (31 substantives and 18 predicates) out of the 200 homonyms that were used in the experiment, and performed an experiment on 50,703 sentences extracted from Sejong Project tagged corpus (i.e. a corpus of morphologically analyzed words) of 3.5 million words that includes one of the 49 homonyms. The result of experimenting by assigning the weight of sense rate(prior probability) and the weight of distance concerning the 5 words at the front/behind the homonym to be disambiguated showed better accuracy than disambiguation systems based on existing statistical models by 2.93%.
منابع مشابه
TAKTAG: Two-phase learning method for hybrid statistical/rule-based part-of-speech disambiguation
Both statistical and rule-based approaches to part-of-speech (POS) disambiguation have their own advantages and limitations. Especially for Korean, the narrow windows provided by hidden markov model (HMM) cannot cover the necessary lexical and longdistance dependencies for POS disambiguation. On the other hand, the rule-based approaches are not accurate and flexible to new tag-sets and language...
متن کاملRule-based Approach to Korean Morphological Disambiguation Supported by Statistical Method
Korean as an agglutinative language shows its proper types of difficulties in morphological disambiguation, since a large number of its ambiguities comes from the stemming while most of ambiguities in French or English are related to the categorization of a morpheme. The current Korean morphological disambiguation systems adopt mainly statistical methods and some of them use rules in the postpr...
متن کاملDisambiguation of Korean utterances using automatic intonation recognition
The paper describes a research on a use of intonation for disambiguating utterance types of Korean spoken sentences. Based on tilt intonation theory [8], two related but separate experiments were performed, both using the Hidden Markov Model training technique. In the first experiment, a system is established so that rough boundary positions of major intonation events are detected. Subsequently...
متن کاملWord Sense Disambiguation In A Korean-To-Japanese MT System Using Neural Networks
This paper presents a method to resolve word sense ambiguity in a Korean-to-Japanese machine translation system using neural networks. The execution of our neural network model is based on the concept codes of a thesaurus. Most previous word sense disambiguation approaches based on neural networks have limitations due to their huge feature set size. By contrast, we reduce the number of features...
متن کاملResolving Sense Ambiguity of Korean Nouns Based on Concept Co-occurrence Information
From the view point of the linguistic typology, Korean and Japanese have many grammatical similarities which enable it to easily construct a sense-tagged Korean corpus through an existing high-quality Japanese-to-Korean machine translation system. The sense-tagged corpus may serve as a knowledge source to extract useful clues for word sense disambiguation (WSD). This paper addresses a disambigu...
متن کامل